Exact representation of the second-order moments for resubstitution and leave-one-out error estimation for linear discriminant analysis in the univariate heteroskedastic Gaussian model

نویسندگان

Amin Zollanvari

Ulisses Braga-Neto

Edward R. Dougherty

چکیده

This paper provides exact analytical expressions for the bias, variance, and RMS for the resubstitution and leave-one-out error estimators in the case of linear discriminant analysis (LDA) in the univariate heteroskedastic Gaussian model. Neither the variances nor the sample sizes for the two classes need be the same. The generality of heteroskedasticity (unequal variances) is a fundamental feature of the work presented in this paper, which distinguishes it from past work. The expected resubstitution and leave-one-out errors are represented by probabilities involving bivariate Gaussian distributions. Their second moments and cross-moments with the actual error are represented by 4-variate Gaussian distributions. From these, the bias, deviation variance, and RMS for resubstitution and leave-one-out as estimators of the actual error can be computed. The RMS expressions are applied to the determination of sample size and apply to biomarker classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers

Error estimation is a problem of high current interest in many areas of application. This paper concerns the classical problem of determining the performance of error estimators in small-sample settings under a Gaussianity parametric assumption. We provide here for the first time the exact sampling distribution of the resubstitution and leave-one-out error estimators for linear discriminant ana...

متن کامل

Joint Sampling Distribution Between Actual and Estimated Classification Errors for Linear Discriminant Analysis1

Error estimation must be used to find the accuracy of a designed classifier, an issue that is critical in biomarker discovery for disease diagnosis and prognosis in genomics and proteomics. This paper presents, for what is believed to be the first time, the analytical formulation for the joint sampling distribution of the actual and estimated errors of a classification rule. The analysis presen...

متن کامل

Unbiased bootstrap error estimation for linear discriminant analysis

Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic argume...

متن کامل

Is cross-validation better than resubstitution for ranking genes?

MOTIVATION Ranking gene feature sets is a key issue for both phenotype classification, for instance, tumor classification in a DNA microarray experiment, and prediction in the context of genetic regulatory networks. Two broad methods are available to estimate the error (misclassification rate) of a classifier. Resubstitution fits a single classifier to the data, and applies this classifier in t...

متن کامل

Relationship between the accuracy of classifier error estimation and complexity of decision boundary

Error estimation is a crucial part of classification methodology and it becomes problematic with small samples. We demonstrate here that the complexity of the decision boundary plays a key role on the performance of error estimation methods. First, a model is developed which quantifies the complexity of a classification problem purely in terms of the geometry of the decision boundary, without r...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Pattern Recognition

دوره 45 شماره

صفحات -

تاریخ انتشار 2012

Exact representation of the second-order moments for resubstitution and leave-one-out error estimation for linear discriminant analysis in the univariate heteroskedastic Gaussian model

نویسندگان

چکیده

منابع مشابه

On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers

Joint Sampling Distribution Between Actual and Estimated Classification Errors for Linear Discriminant Analysis1

Unbiased bootstrap error estimation for linear discriminant analysis

Is cross-validation better than resubstitution for ranking genes?

Relationship between the accuracy of classifier error estimation and complexity of decision boundary

عنوان ژورنال:

اشتراک گذاری